loads packages:

library(tidyverse)
library(gapminder)
library(here)
library(plotly)

Exercise 1: Explain the value of the here::here package

here::here gives you the ability to code a path name that is compatible across different operating systems. Different operating systems have different syntax for paths. The here::here package enables others to run your code without requiring them to rewrite the paths for their specific platform syntax. It does so by finding the root directory then the specific path desired in a format that is independent of the platform being used. The code will still run if you move the file since you do not need to change the relative directory. The code will still run if you open files outside of your Rstudio project. The here::here package is beneficial as it simplifies the use of sub-directories for projects.

Exercise 2: Factor management

BEFORE REORDER: Lets take the gapminder dataset and look continents to explore. Using the class function we can check that continents is indeed a factor, then check how many levels and what the levels are. The first plot shows the order of the levels. The arrange function will order continent by the factor levels.

class(gapminder$continent)
## [1] "factor"
nlevels(gapminder$continent)
## [1] 5
levels(gapminder$continent)
## [1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania"
gapminder %>% 
  ggplot(aes(continent, pop, fill= continent)) +
  geom_col()

gapminder %>%
  arrange(continent) %>%
  DT::datatable()

REORDER: We will first drop Oceania from the dataset using filter in a new dataframe called gapminder1. …reorder the levels according to ascending population using fct_reorder function to arrage the levels in a variable called continent_relevel. In the new dataframe gapminder1 the continent_relevels replace the old levels. We can now check the class, number of levels, and levels of gapminder1, and plot the dataframe containing the new levels. The arrange function will order continent by the new factor levels.

gapminder1 <- gapminder %>%
  filter(continent == "Asia"| continent == "Americas" | continent ==  "Africa"| continent ==  "Europe") 

levels(gapminder1$continent)  #check levels after Oceania removal
## [1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania"
gapminder1 <-gapminder1 %>%
  droplevels()
levels(gapminder1$continent) #check levels after Oceania removal and dropping unused levels
## [1] "Africa"   "Americas" "Asia"     "Europe"
continent_relevel <-fct_reorder(gapminder1$continent, gapminder1$pop, max) %>%
  levels()
gapminder1$continent <- factor(gapminder1$continent, levels = continent_relevel)
  
 #check class, number of levels, levels after Oceania removal and reorder
class(gapminder1$continent)  
## [1] "factor"
nlevels(gapminder1$continent)
## [1] 4
levels(gapminder1$continent)
## [1] "Europe"   "Africa"   "Americas" "Asia"
gapminder1 %>% 
  ggplot(aes(continent, pop, fill= continent)) +
  geom_col()

gapminder1 %>%
  arrange(continent) %>%
  DT::datatable()

Exercise 3: File input/output (I/O)

Lets take gapminder and select only 3 columns of country, continent, and population in a new dataframe called gapminder2. Lets then export it, then import it in another dataframe called gapminder3. We can add factor levels for continent and country in order of ascending population.

gapminder2 <- gapminder %>%
  select(country, continent, pop)
gapminder2 %>%
  DT::datatable()
write_csv(gapminder2, here::here("gapminder2.csv"))

gapminder3 <-read_csv(here::here("gapminder2.csv"))
gapminder3 %>%
  DT::datatable()
continent_relevel3 <-fct_reorder(gapminder3$continent, gapminder3$pop, max) %>%
  levels()
gapminder3$continent <- factor(gapminder3$continent, levels = continent_relevel3)
country_relevel3.1 <-fct_reorder(gapminder3$country, gapminder3$pop, max) %>%
  levels()
gapminder3$country <- factor(gapminder3$country, levels = country_relevel3.1)

# check levels after continent and country reordering
levels(gapminder3$continent)
## [1] "Oceania"  "Europe"   "Africa"   "Americas" "Asia"
levels(gapminder3$country) %>%
  head()
## [1] "Sao Tome and Principe" "Iceland"               "Djibouti"             
## [4] "Equatorial Guinea"     "Bahrain"               "Comoros"

The file was successfully exported and imported.

Exercise 4: Visualization design

Here is my sad assignment 2 plot:

ggplot(gapminder %>%
  filter( country == 'Canada'), aes(gdpPercap, lifeExp)) +
  geom_point(colour = 'blue') +
  scale_x_log10('log(gdpPercap)') 

Lets revamp it by adding a linear regression line, making the background white, adding new axis labels, and making it interactive with the plotly function… Looking fresh.

newPlot <-ggplot(gapminder %>%
  filter( country == 'Canada'), aes(gdpPercap, lifeExp)) +
  geom_smooth(method = "lm", colour = 'yellow') +
  geom_point(colour = 'blue') +
  scale_x_log10('log(gdpPercap)') +
  ylab("Life Expectancy") +
  xlab("log(GDP Per Capita)") +
  theme_bw()

newPlot %>%
  ggplotly()

Exercise 5: Writing figures to file

Lets save the last plot as a png.

ggsave("newPlot.png", width = 5, height = 3, units = "in")

Insert plot image: pic

gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows